intention recognition
MIRAGE: Multimodal Intention Recognition and Admittance-Guided Enhancement in VR-based Multi-object Teleoperation
Sun, Chi, Wang, Xian, Kumar, Abhishek, Cui, Chengbin, Lee, Lik-Hang
This is the author's version of the article. T o appear in an IEEE ISMAR conference. The Hong Kong Polytechnic UniversityFigure 1: A pictorial description of the MIRAGE framework that enhances HRI tele-grasping capability for multiple objects in VR. MIRAGE divides the multi-object grasping task into two phases: movement (manual) and grasping (semi-automatic). Each phase has a specific assistance method designed in MIRAGE: In the movement (manual) phase, Virtual Admittance (VA) modifies the robot trajectory (b), comparing to the non-VA condition (a), is easier to motivate the robot to approach target through the same hand movement; in the grasping (semi-automatic) phase, a Multimodal-CNN-based Human Intention Perception Network (MMIPN) is proposed to estimate the human desired grasp position for robot grasp motion plan (d), and the non-MMIPN condition plans the grasping motion as a vertical downward path (c). Effective human-robot interaction (HRI) in multi-object teleoper-ation tasks faces significant challenges due to perceptual ambiguities in virtual reality (VR) environments and the limitations of single-modality intention recognition. This paper proposes a shared control framework that combines a virtual admittance (V A) model with a Multimodal-CNN-based Human Intention Perception Network (MMIPN) to enhance teleoperation performance and user experience. The V A model employs artificial potential fields to guide operators toward target objects by adjusting admittance force and optimizing motion trajectories. MMIPN processes multi-modal inputs--gaze movement, robot motions, and environmental context--to estimate human grasping intentions, helping overcome depth perception challenges in VR. Gaze data emerged as the most crucial input modality. These findings demonstrate the effectiveness of combining multimodal cues with implicit guidance in VR-based teleoperation, providing a robust solution for multi-object grasping tasks and enabling more natural interactions across various applications in the future. With the rapid development of robotics and metaverse technology, in particular, teleoperation technology has brought diverse modes and expanded opportunities for remote operations. In the fields of aerospace manipulator operation [28, 45], extraterrestrial ground exploration [8], nuclear environment maintenance [46, 15], remote medical surgery [62, 12], and life care assistance [44], teleoperation already has a wide range of technical needs and successful application experience. The rise and prosperity of Metaverse technology have promoted the applications of virtual reality (VR) in industrial teleoperation [67, 9, 48]. The immersion of VR can provide a more realistic experience for the teleoperation.
- Asia > China > Hong Kong (0.60)
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- Europe > Spain (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Uncertainty-Resilient Active Intention Recognition for Robotic Assistants
Saborío, Juan Carlos, Vinci, Marc, Lima, Oscar, Stock, Sebastian, Niecksch, Lennart, Günther, Martin, Sung, Alexander, Hertzberg, Joachim, Atzmüller, Martin
-- Purposeful behavior in robotic assistants requires the integration of multiple components and technological advances. Often, the problem is reduced to recognizing explicit prompts, which limits autonomy, or is oversimplified through assumptions such as near-perfect information. We argue that a critical gap remains unaddressed - specifically, the challenge of reasoning about the uncertain outcomes and perception errors inherent to human intention recognition. In response, we present a framework designed to be resilient to uncertainty and sensor noise, integrating real-time sensor data with a combination of planners. Our integrated framework has been successfully tested on a physical robot with promising results. Robotic assistants may be integrated into modern industrial environments, e.g., delivering tools, parts or modules interleaved with tidying the workspace. Such tasks, however, require a combination of robust planning, navigation, grasping, and perception-particularly when explicit commands are not available and the robot must identify and pursue goals, in collaborative spaces shared with people.
- Europe > Germany > Lower Saxony (0.14)
- Europe > Czechia > Prague (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling > Plan Recognition (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Last Layer Hamiltonian Monte Carlo
Vellenga, Koen, Steinhauer, H. Joe, Falkman, Göran, Andersson, Jonas, Sjögren, Anders
We explore the use of Hamiltonian Monte Carlo (HMC) sampling as a probabilistic last layer approach for deep neural networks (DNNs). While HMC is widely regarded as a gold standard for uncertainty estimation, the computational demands limit its application to large-scale datasets and large DNN architectures. Although the predictions from the sampled DNN parameters can be parallelized, the computational cost still scales linearly with the number of samples (similar to an ensemble). Last layer HMC (LL--HMC) reduces the required computations by restricting the HMC sampling to the final layer of a DNN, making it applicable to more data-intensive scenarios with limited computational resources. In this paper, we compare LL-HMC against five last layer probabilistic deep learning (LL-PDL) methods across three real-world video datasets for driver action and intention. We evaluate the in-distribution classification performance, calibration, and out-of-distribution (OOD) detection. Due to the stochastic nature of the probabilistic evaluations, we performed five grid searches for different random seeds to avoid being reliant on a single initialization for the hyperparameter configurations. The results show that LL--HMC achieves competitive in-distribution classification and OOD detection performance. Additional sampled last layer parameters do not improve the classification performance, but can improve the OOD detection. Multiple chains or starting positions did not yield consistent improvements.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Information Technology (0.67)
- Automobiles & Trucks (0.46)
- Transportation (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- (2 more...)
MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution
Zhang, Zejia, Yang, Bo, Chen, Xinxing, Shi, Weizhuang, Wang, Haoyuan, Luo, Wei, Huang, Jian
A promising effective human-robot interaction in assistive robotic systems is gaze-based control. However, current gaze-based assistive systems mainly help users with basic grasping actions, offering limited support. Moreover, the restricted intent recognition capability constrains the assistive system's ability to provide diverse assistance functions. In this paper, we propose an open implicit intention recognition framework powered by Large Language Model (LLM) and Vision Foundation Model (VFM), which can process gaze input and recognize user intents that are not confined to predefined or specific scenarios. Furthermore, we implement a gaze-driven LLM-enhanced assistive robot system (MindEye-OmniAssist) that recognizes user's intentions through gaze and assists in completing task. To achieve this, the system utilizes open vocabulary object detector, intention recognition network and LLM to infer their full intentions. By integrating eye movement feedback and LLM, it generates action sequences to assist the user in completing tasks. Real-world experiments have been conducted for assistive tasks, and the system achieved an overall success rate of 41/55 across various undefined tasks. Preliminary results show that the proposed method holds the potential to provide a more user-friendly human-computer interaction interface and significantly enhance the versatility and effectiveness of assistive systems by supporting more complex and diverse task.
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Intention Recognition in Real-Time Interactive Navigation Maps
Zhao, Peijie, Arefin, Zunayed, Meneguzzi, Felipe, Pereira, Ramon Fraga
In this demonstration, we develop IntentRec4Maps, a system to recognise users' intentions in interactive maps for real-world navigation. IntentRec4Maps uses the Google Maps Platform as the real-world interactive map, and a very effective approach for recognising users' intentions in real-time. We showcase the recognition process of IntentRec4Maps using two different Path-Planners and a Large Language Model (LLM). GitHub: https://github.com/PeijieZ/IntentRec4Maps
- South America > Brazil (0.15)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.05)
- Asia > China > Hong Kong (0.05)
- Europe > United Kingdom > England > Greater London > London > Kensington and Chelsea (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Fitting Different Interactive Information: Joint Classification of Emotion and Intention
Li, Xinger, Zhong, Zhiqiang, Huang, Bo, Yang, Yang
This paper is the first-place solution for ICASSP MEIJU@2025 Track I, which focuses on low-resource multimodal emotion and intention recognition. How to effectively utilize a large amount of unlabeled data, while ensuring the mutual promotion of different difficulty levels tasks in the interaction stage, these two points become the key to the competition. In this paper, pseudo-label labeling is carried out on the model trained with labeled data, and samples with high confidence and their labels are selected to alleviate the problem of low resources. At the same time, the characteristic of easy represented ability of intention recognition found in the experiment is used to make mutually promote with emotion recognition under different attention heads, and higher performance of intention recognition is achieved through fusion. Finally, under the refined processing data, we achieve the score of 0.5532 in the Test set, and win the championship of the track.
Towards Intention Recognition for Robotic Assistants Through Online POMDP Planning
Saborio, Juan Carlos, Hertzberg, Joachim
Intention recognition, or the ability to anticipate the actions of another agent, plays a vital role in the design and development of automated assistants that can support humans in their daily tasks. In particular, industrial settings pose interesting challenges that include potential distractions for a decision-maker as well as noisy or incomplete observations. In such a setting, a robotic assistant tasked with helping and supporting a human worker must interleave information gathering actions with proactive tasks of its own, an approach that has been referred to as active goal recognition. In this paper we describe a partially observable model for online intention recognition, show some preliminary experimental results and discuss some of the challenges present in this family of problems.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling > Plan Recognition (0.94)
Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction
Zhao, Xiyuan, Li, Huijun, Miao, Tianyuan, Zhu, Xianyi, Wei, Zhikai, Song, Aiguo
The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognition in shared environments. The current paramount challenge for this is reducing the uncertainty of multimodal fused intention to be recognized and reasoning adaptively a more reliable result despite current interactive condition. In this work we propose a novel learning-based multimodal fusion framework Batch Multimodal Confidence Learning for Opinion Pool (BMCLOP). Our approach combines Bayesian multimodal fusion method and batch confidence learning algorithm to improve accuracy, uncertainty reduction and success rate given the interactive condition. In particular, the generic and practical multimodal intention recognition framework can be easily extended further. Our desired assistive scenarios consider three modalities gestures, speech and gaze, all of which produce categorical distributions over all the finite intentions. The proposed method is validated with a six-DoF robot through extensive experiments and exhibits high performance compared to baselines.
Intelligent Mode-switching Framework for Teleoperation
Kizilkaya, Burak, She, Changyang, Zhao, Guodong, Imran, Muhammad Ali
Teleoperation can be very difficult due to limited perception, high communication latency, and limited degrees of freedom (DoFs) at the operator side. Autonomous teleoperation is proposed to overcome this difficulty by predicting user intentions and performing some parts of the task autonomously to decrease the demand on the operator and increase the task completion rate. However, decision-making for mode-switching is generally assumed to be done by the operator, which brings an extra DoF to be controlled by the operator and introduces extra mental demand. On the other hand, the communication perspective is not investigated in the current literature, although communication imperfections and resource limitations are the main bottlenecks for teleoperation. In this study, we propose an intelligent mode-switching framework by jointly considering mode-switching and communication systems. User intention recognition is done at the operator side. Based on user intention recognition, a deep reinforcement learning (DRL) agent is trained and deployed at the operator side to seamlessly switch between autonomous and teleoperation modes. A real-world data set is collected from our teleoperation testbed to train both user intention recognition and DRL algorithms. Our results show that the proposed framework can achieve up to 50% communication load reduction with improved task completion probability.
Designing deep neural networks for driver intention recognition
Vellenga, Koen, Steinhauer, H. Joe, Karlsson, Alexander, Falkman, Göran, Rhodin, Asli, Koppisetty, Ashok
Driver intention recognition studies increasingly rely on deep neural networks. Deep neural networks have achieved top performance for many different tasks, but it is not a common practice to explicitly analyse the complexity and performance of the network's architecture. Therefore, this paper applies neural architecture search to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities. We explore a pre-defined search space for three deep neural network layer types that are capable to handle sequential data (a long-short term memory, temporal convolution, and a time-series transformer layer), and the influence of different data fusion strategies on the driver intention recognition performance. A set of eight search strategies are evaluated for two driver intention recognition datasets. For the two datasets, we observed that there is no search strategy clearly sampling better deep neural network architectures. However, performing an architecture search does improve the model performance compared to the original manually designed networks. Furthermore, we observe no relation between increased model complexity and higher driver intention recognition performance. The result indicate that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.
- Europe > Sweden (0.04)
- Oceania > Australia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Transportation (0.70)
- Automobiles & Trucks (0.68)